AITopics

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Chris Pal

Unsupervised Depth Estimation, 3D Face Rotation and Replacement

Neural Information Processing SystemsFeb-13-2026, 21:20:54 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, keypoint, machine learning, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)

Joel Ruben Antony Moniz, Christopher Beckham, Simon Rajotte, Sina Honari, Chris Pal

Unsupervised Depth Estimation, 3D Face Rotation and Replacement

Neural Information Processing SystemsNov-20-2025, 18:38:07 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, keypoint, machine learning, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)

Neural Information Processing SystemsOct-10-2025, 20:13:41 GMT

ea011cd13bcdf7ca38506c844dccfee8-Paper-Conference.pdf

background image, dataset, depth value, (15 more...)

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Gnettner, Felix, Kirch, Claudia, Nieto-Reyes, Alicia

seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees

arXiv.org Machine LearningJul-9-2025

Statistical depth functions provide center-outward orderings in spaces of dimension larger than one, where a natural ordering does not exist. The numerical evaluation of such depth functions can be computationally prohibitive, even for relatively low dimensions. We present a novel sequentially implemented Monte Carlo methodology for the computation of, theoretical and empirical, depth functions and related quantities (seMCD), that outputs an interval, a so-called seMCD-bucket, to which the quantity of interest belongs with a high probability prespecified by the user. For specific classes of depth functions, we adapt algorithms from sequential testing, providing finite-sample guarantees. For depth functions dependent on unknown distributions, we offer asymptotic guarantees using non-parametric statistical methods. In contrast to plain-vanilla Monte Carlo methodology the number of samples required in the algorithm is random but typically much smaller than standard choices suggested in the literature. The seMCD method can be applied to various depth functions, covering multivariate and functional spaces. We demonstrate the efficiency and reliability of our approach through empirical studies, highlighting its applicability in outlier or anomaly detection, classification, and depth region computation. In conclusion, the seMCD-algorithm can achieve accurate depth approximations with few Monte Carlo samples while maintaining rigorous statistical guarantees.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2507.06227

Country:

Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Spain > Cantabria (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceMar-17-2025

MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs

Daxberger, Erik, Wenzel, Nina, Griffiths, David, Gang, Haiming, Lazarow, Justin, Kohavi, Gefen, Kang, Kai, Eichner, Marcin, Yang, Yinfei, Dehghan, Afshin, Grasch, Peter

Multimodal large language models (MLLMs) excel at 2D visual understanding but remain limited in their ability to reason about 3D space. In this work, we leverage large-scale high-quality 3D scene data with open-set annotations to introduce 1) a novel supervised fine-tuning dataset and 2) a new evaluation benchmark, focused on indoor scenes. Our Cubify Anything VQA (CA-VQA) data covers diverse spatial tasks including spatial relationship prediction, metric size and distance estimation, and 3D grounding. We show that CA-VQA enables us to train MM-Spatial, a strong generalist MLLM that also achieves state-of-the-art performance on 3D spatial understanding benchmarks, including our own. We show how incorporating metric depth and multi-view inputs (provided in CA-VQA) can further improve 3D understanding, and demonstrate that data alone allows our model to achieve depth perception capabilities comparable to dedicated monocular depth estimation models. We will publish our SFT dataset and benchmark.

large language model, machine learning, natural language, (19 more...)

2503.13111

Country:

South America > Brazil (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceMar-5-2025

Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments

Deng, Jie, Lang, Fengtian, Yuan, Zikang, Yang, Xin

Direct Sparse Odometry with Continuous 3D Gaussian Maps for Indoor Environments Jie Deng 1, Fengtian Lang 1, Zikang Y uan 2 and Xin Y ang 1 Abstract -- Accurate localization is essential for robotics and augmented reality applications such as autonomous navigation. Vision-based methods combining prior maps aim to integrate LiDAR-level accuracy with camera cost efficiency for robust pose estimation. Existing approaches, however, often depend on unreliable interpolation procedures when associating discrete point cloud maps with dense image pixels, which inevitably introduces depth errors and degrades pose estimation accuracy. We propose a monocular visual odometry framework utilizing a continuous 3D Gaussian map, which directly assigns geometrically consistent depth values to all extracted high-gradient points without interpolation. Evaluations on two public datasets demonstrate superior tracking accuracy compared to existing methods. We have released the source code of this work for the development of the community. I NTRODUCTION Visual odometry (VO)/visual-inertial odometry (VIO) is a crucial capability in a wide range of technologies, including robotics, unmanned aerial vehicles and mixed reality.

artificial intelligence, depth map, image understanding, (15 more...)

2503.03373

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

David Eigen, Christian Puhrsch, Rob Fergus

Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

Neural Information Processing SystemsFeb-9-2025, 09:05:21 GMT

Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

artificial intelligence, machine learning, prediction, (18 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Artificial IntelligenceJan-4-2025

Journey into Automation: Image-Derived Pavement Texture Extraction and Evaluation

Lu, Bingjie, Dan, Han-Cheng, Zhang, Yichen, Huang, Zhetao

Mean texture depth (MTD) is pivotal in assessing the skid resistance of asphalt pavements and ensuring road safety. This study focuses on developing an automated system for extracting texture features and evaluating MTD based on pavement images. The contributions of this work are threefold: firstly, it proposes an economical method to acquire three-dimensional (3D) pavement texture data; secondly, it enhances 3D image processing techniques and formulates features that represent various aspects of texture; thirdly, it establishes multivariate prediction models that link these features with MTD values. Validation results demonstrate that the Gradient Boosting Tree (GBT) model achieves remarkable prediction stability and accuracy (R2 = 0.9858), and field tests indicate the superiority of the proposed method over other techniques, with relative errors below 10%. This method offers a comprehensive end-to-end solution for pavement quality evaluation, from images input to MTD predictions output.

data mining, machine learning, natural language, (19 more...)

2501.02414

Country:

Asia > South Korea (0.04)
Asia > China > Hunan Province (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Energy (0.89)
Transportation > Ground > Road (0.66)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Mahjourian, Nazanin, Nguyen, Vinh

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

arXiv.org Artificial IntelligenceNov-13-2024

Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.

detection, sensor, variant, (16 more...)

2411.09062

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)